Diversifying Support Vector Machines for Boosting using Kernel Perturbation: Applications to Class Imbalance and Small Disjuncts

نویسندگان

  • Shounak Datta
  • Sayak Nag
  • Sankha Subhra Mullick
  • Swagatam Das
چکیده

The diversification (generating slightly varying separating discriminators) of Support Vector Machines (SVMs) for boosting has proven to be a challenge due to the strong learning nature of SVMs. Based on the insight that perturbing the SVM kernel may help in diversifying SVMs, we propose two kernel perturbation based boosting schemes where the kernel is modified in each round so as to increase the resolution of the kernel-induced Reimannian metric in the vicinity of the datapoints misclassified in the previous round. We propose a method for identifying the disjuncts in a dataset, dispelling the dependence on rule-based learning methods for identifying the disjuncts. We also present a new performance measure called Geometric Small Disjunct Index (GSDI) to quantify the performance on small disjuncts for balanced as well as class imbalanced datasets. Experimental comparison with a variety of state-of-the-art algorithms is carried out using the best classifiers of each type selected by a new approach inspired by multicriteria decision making. The proposed method is found to outperform the contending state-of-the-art methods on different datasets (ranging from mildly imbalanced to highly imbalanced and characterized by varying number of disjuncts) in terms of three different performance indices (including the proposed GSDI).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Remote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery

Land use is being considered as an element in determining land change studies, environmental planning and natural resource applications. The Earth’s surface Study by remote sensing has many benefits such as, continuous acquisition of data, broad regional coverage, cost effective data, map accurate data, and large archives of historical data. To study land use / cover, remote sensing as an effic...

متن کامل

AN INTRODUCTION TO KERNEL - BASED LEARNING ALGORITHMS 183 Empirical Risk

| This review provides an introduction to Support Vector Machines, Kernel Fisher Discriminant analysis and Kernel PCA, as examples for successful kernel based learning methods. We rst give a short background about VC theory and kernel feature spaces and then proceed to kernel based learning in supervised and unsupervised scenarios including practical and algorithmic considerations. We illustrat...

متن کامل

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

A prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)

Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...

متن کامل

A prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)

Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1712.08493  شماره 

صفحات  -

تاریخ انتشار 2017